6.2 Constraint

61

More formally, the relative entropy (Kullback–Leibler divergence) 16 between two

(discrete) distributions with probability functions a Subscript kak and b Subscript kbk is

script upper R left parenthesis a comma b right parenthesis equals sigma summation Underscript k Endscripts a Subscript k Baseline log Subscript 2 Baseline left parenthesis a Subscript k Baseline divided by b Subscript k Baseline right parenthesis periodR(a, b) =

Σ

k

ak log2(ak/bk) .

(6.19)

Ifa Subscript kak is an actual distribution of observations, andb Subscript kbk is a model description approxi-

mating to the data, 17 thenscript upper R left parenthesis a comma b right parenthesisR(a, b) is the expected difference (expressed as the number

of bits) between encoding samples froma Subscript kak using a code based onaa and using a code

based on bb. This can be seen by writing Eq. (6.19) as

script upper R left parenthesis a comma b right parenthesis equals minus sigma summation Underscript k Endscripts b Subscript k Baseline log Subscript 2 Baseline a Subscript k Baseline plus sigma summation Underscript k Endscripts a Subscript k Baseline log Subscript 2 Baseline a Subscript k Baseline commaR(a, b) = −

Σ

k

bk log2 ak +

Σ

k

ak log2 ak ,

(6.20)

where the first term on the right-hand side is called the cross-entropy of a Subscript kak and b Subscript kbk,

the expected number of bits required to encode observations from aa when using a

code based on bb rather than aa. Conversely, script upper R left parenthesis a comma b right parenthesisR(a, b) is the gain in information if a

code based on aa rather than bb is used.

Suppose that upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m Baseline right braceP{x1, x2, . . . , xm} is the probability of having a certain pattern

(arrangement), or mm-gram x 1 comma x 2 comma ellipsis comma x Subscript m Baselinex1, x2, . . . , xm, 18 assumed to be ergodic (stationary

stochastic). 19 Examples could be the English texts studied by Shannon; of partic-

ular relevance to the topic of this book is the problem of predicting the nucleic acid

base following a known (sequenced) arrangement. The conditional probability 20 that

the pattern [left parenthesis m minus 1 right parenthesis(m1)-gram] x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baselinex1, x2, . . . , xm1 is followed by the symbol x Subscript mxm is

upper P left brace x Subscript m Baseline vertical bar x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline right brace equals StartFraction upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline comma x Subscript m Baseline right brace Over upper P left brace x 1 comma x 2 comma ellipsis comma x Subscript m minus 1 Baseline right brace EndFraction periodP{xm|x1, x2, . . . , xm1} = P{x1, x2, . . . , xm1, xm}

P{x1, x2, . . . , xm1}

.

(6.21)

The “mm-length approximation” to the entropyupper S Subscript mSm, defined as the average uncertainty

about the next symbol, is

Sm = −

Σ

x1,x2,...,xm1

P{x1, x2, . . . , xm1}

×

Σ

x

P{xm|x1, x2, . . . , xm1} log P{xm|x1, x2, . . . , xm1} .

(6.22)

16 Although sometimes called “distance”, since script upper R left parenthesis a comma b right parenthesis not equals script upper R left parenthesis b comma a right parenthesisR(a, b) /= R(b, a) it is not a true metric and is

therefore better called divergence rather than distance.

17 Possibly constructed a priori.

18 See also Sect. 13.1.

19 See Sect. 11.1.

20 See Sect. 9.2.2.